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ON  APPROXIMATE  CONFIDENCE  INTERVALS 
FOR  MEASURES  OF  CONCORDANCE 

Abstract 

The  use  of  U-statistics  based  on  rank  correlation  coefficients  in 
estimating  the  strength  of  concordance  among  a  group  of  rankers  is 
examined  for  cases  where  the  null  hypothesis  of  random  rankings  is 
not  tenable.  The  studentized  U-statistic  is  asymptotically 
distribution-free,  and  the  Student-t  approximation  is  used  for 
small  and  moderate  sized  samples.  An  approximate  confidence 
interval  is  constructed  for  the  strength  of  concordance.  Monte 
Carlo  results  indicate  that  the  Student-t  approximation  can  be 
improved  by  estimating  the  degrees  of  freedom. 


ON  APPROXIMATE  CONFIDENCE  INTERVALS 


FOR  MEASURES  OF  CONCORDANCE 

I.  Introduction 

Solutions  to  the  problem  of  testing  for  agreement  among  n 

sets  of  rankings  of  k  objects  have  been  proposed  by  Kendall  and 

Bab ington- Smith  [1939]  and  Ehrenberg  [1952] .  Kendall  and  Babington- 

Smith  proposed  the  coefficient  of  concordance  W,  which  is  related 

2 

to  Friedman’s  [1937]  xr  for  two-way  analysis  of  variance  using 
ranks.  Ehrenberg's  statistic  is  the  average  of  the  Kendall  rank 
correlation  coefficients  t  between  the  (^)  pairs  of  judges.  These 
statistics  have  been  studied  extensively  under  the  usual  null 
hypothesis  of  random  rankings,  which  implies  that  for  each  judge, 
each  of  the  k!  permutations  of  the  ranks  l,...,k  is  equally  likely 
to  be  assigned  to  the  k  objects. 

Very  little  work  has  been  done  in  the  non-null  case.  Kraemer 
[1976]  proposed  a  non-null  approximation  to  the  distribution  of  W, 
but  this  approximation  is  based  on  an  empirical  study  using  data 
generated  from  a  normal  components-of-variance  model. 

There  are  many  situations,  however,  in  which  it  is  known 
that  there  is  agreement  among  the  judges  in  the  population,  and 
the  investigator  would  like  to  estimate  the  strength  of  agreement 
among  the  judges.  For  instance,  an  Investigator  may  know  that 
two  populations  of  judges  agree  on  the  preference  of  k  objects 
and  wishes  to  know  which  group  holds  the  preference  more  strongly. 
What  is  needed  in  these  situations  is  a  parametric  measure  of  the 
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intensity  with  which  a  preference  of  objects  is  held  by  a  group 
of  judges. 


II.  Internal  Rank  Correlation 


Quade  [1972]  proposed  a  measure  of  agreement  of  rankings 
that  is  based  on  the  expected  rank  correlation  between  a  pair 
of  independent  rankings.  Denote  the  rankings  of  the  objects  by 
a  sample  of  n  judges  as  =  (X^, . . .  ,X^) ' ,  i*l,...,n.  Quade's 
measure  of  concordance,  the  internal  rank  correlation,  is  given  by 

(2.1)  p  *  E[R(X^»X.)]  ,  i  +  j. 


where  and  are  independent  rankings  and  R(*,*)  is  any  rank 
correlation  coefficient.  Ttoo  particular  measures  are  those  obtained 
by  using  the  Spearman  [1904]  and  Kendall  [1938]  rank  correlation 
coefficients,  which  will  be  denoted  as  Rg  and  R^,  respectively. 

Under  the  null  hypothesis  of  random  rankings  one  finds  that 
P  ■  0.  However,  p  is  positive  if  there  is  agreement  among  the  judges, 
andp  *  1  when  there  is  perfect  agreement.  So  p  measures  the  intensity 
with  which  a  preference  of  the  objects  is  held  by  the  judges.  An  inves¬ 
tigator  may  be  interested  in  estimating  p  to  compare  with  a  "norm"  or  to 
compare  with  the  estimated  internal  rank  correlations  from  other  populations. 

The  U-statlstic  estimator  of  p  is  given  by 

(2.2)  R-  (-)"1  I  I  R(X,,X.)  , 

l<i< j<n 


which  Quade  refers  to  as  the  average  internal  rank  correlation. 


The  asymptotic  (n-*»)  variance  of  /n  R  is  given  by  4t  ,  where 

R 

(2.3)  Cr-E[^(X)] 

and 

*R(X)  -  EtR(XlfX2)  |X2  -  X]  -  P  • 

If  SR  >  0  then  Hoeff ding's  [1948]  results  show  that  the 

limiting  distribution  of  v^i(R-p) // 4?  is  standard  normal.  Under 

R 

the  null  hypothesis  of  random  rankings  *  0  and  the  limiting 
distribution  of  ^  R  is  degenerate.  Under  mild  regularity  condi¬ 
tions  given  by  Quade  [1972],  however,  ?  >  0  when  p  >  0. 

R 

Since  5  is  seldom  known,  one  usually  estimates  this  parameter. 

A  consistent  estimator  of  ?  can  be  obtained  from  a  method  of 

R 

Sen  [1960]  and  is  given  by 

<2-4>  4r- 

where 

1  n 

(2.5)  Vq(X1)  -  ~  R^.Xj),  i  -  1 . . 

W 

are  the  sample  components  of  R. 

The  asymptotic  distribution  of  v'ti(R-p)  /V^?R  is  also  standard 
normal  under  the  regularity  condition  CR  >  0,  and  this  studentized 
U- statistic  can  be  used  to  construct  approximate  tests  and  confi¬ 
dence  intervals. 

III.  Refinement  of  Interval  Estimation 

The  distributions  of  studentized  U-statistics  are  often 
approximated  by  the  Student-t  distribution  on  n-1  degrees  of  freedom. 
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Using  this  approximation,  one  can  obtain  an  approximate  100(l-a)% 
confidence  interval  for  p  as 


where  ta(v)  is  the  (l-a)th  quantile  of  the  Student-t  distribution 
on  v  degrees  of  freedom. 

The  choice  of  n-1  degrees  of  freedom  seems  reasonable  since 
1  n 

R  •  ~  I  V  (X.)  . 

n  j  n  -i 


However,  the  sample  components  are  not  Independent,  and  the  approxi¬ 
mation  using  n-1  degrees  of  freedom  can  lead  to  problems  of  under¬ 
coverage  when  estimating  p. 

To  illustrate  this  we  use  a  model  introduced  by  Mallows  [1957] 
and  later  studied  by  Feigin  and  Cohen  [1978].  Let  x^  be  a  fixed 
vector  with  one  of  the  k!  orderings  of  the  integers  l,...,k,  and  for 
every  possible  ranking  x  let  d(Xg,x)  denote  a  "distance"  (in  a  rank 
correlation  sense)  between  Xq  and  x.  A  model  which  assigns  equal 
probabilities  to  rankings  with  the  same  value  of  d  is  then 

(3.1)  PQ(x)  -  c(0)9d(^O’-)  ,  0  <  8  <  1, 

where 

c(9)  «[e  0d(5o»*>]  1  , 

Ly 

the  summation  being  over  all  kt  possible  rankings.  The  smaller  the 
rank  correlation  of  x  with  Xq,  the  smaller  the  probability  of 
occurrence  of  x.  The  extreme  of  0*0  corresponds  to  perfect  con¬ 
cordance  and  8-1  corresponds  to  random  rankings. 
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Data  were  generated  from  this  model  at  various  values  of  8 

using  distances  based  on  both  the  Spearman  and  Kendall  rank 

correlation  coefficients.  These  simulations  were  performed  on 

the  C  D  C  6600  computer  at  Southern  Methodist  University.  The 

possible  rankings  were  denoted  by  x^,  i  »  l,...,k!,  and  the  x^ 

vector  used  was  Xq  =  (1,2, . . . ,k) ' .  A  uniform  (0,1)  observation 

u  was  generated  by  using  the  C  D  C  pseudorandom  number  generator 

RANF  and  the  generated  ranking  was  then  that  ranking  x.  which 

— 1 

satisfied 

Fi-i  "  u  i  Fi  * 

i 

where  Fn  -  0  and  F.  *  I  Pfl(x.)  for  i  *  l,...,k!  . 

0  1  j-1  "3 

For  k  *  4,  1000  samples  of  size  n  *  25  were  generated  for 
several  values  of  0,  and  approximate  95%  confidence  intervals 
were  obtained  for  the  parameters  pg  and  p^,  which  are  the  popula¬ 
tion  internal  rank  correlation  measures  when  using  R  (*,*)  and 

s 

Rk(*»*)»  respectively. 

The  empirical  coverages  that  were  obtained  from  these  simula¬ 
tions  are  given  in  Table  1.  The  standard  error  of  these  proportions 
at  the  nominal  level  is  .0069.  Most  of  these  coverages  are  signi¬ 
ficantly  less  than  .95.  Larger  samples  were  also  generated  for 
two  configurations  using  the  Kendall  distance,  and  the  empirical 
coverages  are  given  in  Table  2.  These  show  that  the  coverages 
improve  when  the  sample  size  becomes  quite  large. 

For  small  and  moderate  sized  samples,  however,  the  intervals 
are  too  small.  This  problem  could  be  due  to  the  choice  of  n-1 
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degrees  of  freedom  in  the  t-approximation.  One  method  of  improving 
the  coverage  is  that  of  estimating  the  degrees  of  freedom.  Hinkley 
[1977]  has  proposed  a  method  of  estimating  the  degrees  of  freedom 
for  the  studentized  jackknife  estimator.  This  method  can  be  used 
in  the  present  setting  to  adjust  the  interval  width  only,  since  R 
is  invariant  to  the  jackknife  procedure. 


The  pseudovalues  for  jackknifing  R  can  be  shown  to  be 

<3-2>  v-j  * 2  W  -  A  *  • 


Since  n  V  is  also  a  consistent  estimator  of  4t  ,  then  (R-p)//tf7 
R  R  K 

is  asymptotically  standard  normal  if  >  0- 

K 


Hinkley's  estimator  for  the  degrees  of  freedom  is  given  by 


(3.4)  fn  - 


where 

(3.5)  K  - 
n 


n(n-l)(n-2)2 


[The  expression  for  Kr  given  in  the  Hinkley  paper  contains  a  slight 

2 

error  in  the  coefficient  of  V^,  and  the  correct  expression  is  given 
here  in  (3.5)].  The  estimated  degrees  of  freedom  can  also  be 
expressed  in  terms  of  the  sample  components  as 


The  jackknife  variance  estimator  and  the  degrees  of  freedom 
estimator  were  used  on  the  data  that  were  generated  for  Table  1. 

Table  3  gives  the  empirical  coverages  that  were  obtained  from  these 
modified  approximate  95%  confidence  intervals.  A  comparison  of  these 
results  with  those  in  Table  1  shows  that  the  empirical  coverages 
have  improved  in  almost  every  configuration.  Simulations  from 
samples  of  size  n«10  also  show  an  improvement  in  coverage  when  using 
the  estimated  degrees  of  freedom. 

Table  4  gives  the  average  lengths  of  the  confidence  intervals 

for  p  and  p,  that  were  obtained  when  generating  samples  from  the 

s  K. 

model  using  the  Kendall  distance  measure.  These  average  lengths  are 
larger  when  estimating  the  degrees  of  freedom  than  when  using  n-1 
degrees  of  freedom.  However,  this  is  expected  since  the  empirical 
coverages  have  increased. 

The  minimum  estimated  degrees  of  freedom  are  between  2  and  4 
for  most  of  the  samples  generated,  and  the  minimum  among  all  generated 
samples  is  2.0.  However,  the  average  estimated  degrees  of  freedom  for 
some  models  was  greater  than  n-1,  as  can  be  seen  in  Table  5.  So  there 
are  many  instances  where  the  estimated  degrees  of  freedom  are  larger 
than  n-1.  In  these  cases  the  lengths  of  the  confidence  intervals  are 
smaller  when  estimating  the  degrees  of  freedom  than  when  using  n-1 
degrees  of  freedom.  Nevertheless,  as  Tables  4  and  5  show,  even  when 
the  average  estimated  degrees  of  freedom  exceeds  n-1,  the  associated 
average  confidence  interval  is  not  shorter. 


IV.  Example  Application 


/ 


An  Investigator  is  interested  in  estimating  the  strength  of 
agreement  of  male  college  students  on  the  importance  ordering  of 
seven  basic  human  needs.  These  needs  are 

A)  Self-actualization 

B)  Cognitive  needs 

C)  Physiological  needs 

D)  Aesthetic  needs 

E)  Esteem  needs 

F)  Belongingness  and  love,  and 

G)  Safety  needs. 

A  sample  of  15  male  college  students  was  obtained,  and  each  student 
ranked  the  needs  based  on  the  criterion  of  importance.  These  rankings 
are  given  in  Table  6. 

The  Spearman  rank  correlation  coefficients,  R  (X. ,X. ,)  are  found 

s  “1  “1 

for  every  pair  of  rankings,  and  from  these  one  obtains  the  sample 

components  V  (X.)  which  are  also  given  in  Table  6.  This  leads  to 
n  — i 

-  i  n 

R  ■  —  I  V  (x)  -  .23979 

n  i„i  n  -1 

and 

K  -  j/W  -  «2  - 

Then  the  variance  of  R  is  estimated  by 
VR  4  “  ’"“l. 


To  estimate  the  degrees  of  freedom  for  the  t  approximation  we 


need 

n  _  L 

Z  [V  (X.)  -  R]  -  .00286. 
i-1  n  ~± 


Using  (3.6),  the  degrees  of  freedom  are  estimated  by 
A  <13)2(.00910)2 

f  -  - - tt - s'-  "  W.69  . 

n  ^(.00286)-  -g(. 00910) 


Since  t  025(^.69)  ~  2.135,  an  approximate  95%  confidence 
interval  for  p  is 

.23979  -  (2. 135)/. 00281  <  p  <  .23979  +  (2.135)/.00281  , 
i.e.  .1266  <  p  <  .3530. 

This  gives  an  approximate  95%  confidence  interval  for  the  strength 
of  agreement  among  male  college  students  on  the  ordering  of  the 
seven  basic  human  needs.  Notice  that  for  this  data  set  the  estimated 
degrees  of  freedom  statistic  is  quite  close  to  n-1.  As  the  Monte 
Carlo  results  suggest,  other  samples  from  this  same  population  may 
be  expected  to  yield  values  of  f  that  differ  considerably  from  n-1 
in  either  direction. 


V.  Conclusions 

Knowledge  of  the  parameter  p  can  be  very  useful  to  an  investi¬ 
gator  who  wants  to  determine  the  strength  of  agreement  among  a 
population  on  the  rank-order  preference  of  k  objects.  This  parameter 
can  be  estimated  without  putting  model  constraints  on  the  rankings 
since  the  U-statistic  estimator  is  asymptotically  distribution-free. 
The  estimation  of  p  can  also  be  improved  by  using  Hinkley's  method 
of  estimating  the  degrees  of  freedom  for  the  Student-t  approximation 
to  the  distribution  of  the  studentized  U-statistic.  This  method 
provides  better  coverage  by  increasing  the  lengths  of  intervals  that 
are  too  short,  and  it  can  lead  to  more  accurate  estimation  in  many 
cases  by  decreasing  the  lengths  of  some  intervals  that  are  too  long. 
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TABLE  1 

Empirical  Coverage  of  Approximate  95%  Confidence 

Intervals  for  p  and  p,  with  k-4  and  n*25 
s  tc 

(1,000  Simulations) 


0 

Kendall 

Distance 

Spearman  distance 

ps 

Q. 

ps 

Pk 

.2 

.923 

.936 

.938 

.936 

.3 

.906 

.928 

.921 

.929 

.4 

.938 

.937 

.919 

.925 

.5 

.929 

.932 

.924 

.940 

.6 

.917 

.915 

.929 

.951 

.7 

.929 

.930 

.924 

.927 

.8 

.921 

.922 

.934 

.939 

.9 

.934 

.952 

.916 

.916 

TABLE  3 


Empirical  Coverage  of  Approximate  95%  Confidence  Intervals 
for  p  and  p.  using  Estimated  Degrees  of  Freedom 

S  K 

(1,000  Simulations) 


Kendall 

Distance 

Spearman  distance 

e 

ps 

pk 

ps 

Pk 

.2 

.946 

.951 

.939 

.938 

.3 

.927 

.941 

.970 

.980 

.4 

.953 

.951 

.938 

.942 

.5 

.938 

.937 

.937 

.948 

.6 

.927 

.926 

.951 

.958 

.7 

.935 

.938 

.939 

.941 

.8 

.924 

.928 

.941 

.947 

.9 

.934 

.953 

.926 

.926 
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